A Practical Approach to Language Complexity: A Wikipedia Case Study
نویسندگان
چکیده
In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.
منابع مشابه
Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملTranslator Education in the Light of Complexity Theory: A Case of Iran’s Higher Education System
In the fast-growing world of translation studies, many students may not receive adequate training at universities. A new multi-facetted approach is therefore needed to be applied in translator educational programs to meet the students’ needs and professional expectations. In order to describe the complex interrelations in translator education systems and propose a research framework that takes ...
متن کاملDevelopment of Fluency, Accuracy, and Complexity in Productive Skills of EFL learners across Gender and Proficiency: A Chaos Complexity Approach
This study was an attempt to investigate the developmental rate of fluency, accuracy and complexity among 12 EFL learners within the framework of chaos complexity theory. To carry out this study, 6 female and 6 male participants in two levels of proficiency (pre-and upper-intermediate) were put in two classes taught by the same teacher and following the same course. Every two months (for a peri...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملA Navigation System for Autonomous Robot Operating in Unknown and Dynamic Environment: Escaping Algorithm
In this study, the problem of navigation in dynamic and unknown environment is investigated and a navigation method based on force field approach is suggested. It is assumed that the robot performs navigation in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 7 شماره
صفحات -
تاریخ انتشار 2012